Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells428429
Missing cells (%)8.0%8.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 89 (20.0%) missing values Age has 91 (20.4%) missing values Missing
Cabin has 337 (75.6%) missing values Cabin has 337 (75.6%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 298 (66.8%) zeros SibSp has 304 (68.2%) zeros Zeros
Parch has 341 (76.5%) zeros Parch has 341 (76.5%) zeros Zeros
Fare has 5 (1.1%) zeros Fare has 7 (1.6%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-05-07 16:31:49.3287362024-05-07 16:31:53.356932
Analysis finished2024-05-07 16:31:53.3558222024-05-07 16:31:57.261271
Duration4.03 seconds3.9 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean432.69283446.57848
 Dataset ADataset B
Minimum11
Maximum885890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T16:31:57.669955image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile4046.25
Q1202.5223.25
median415.5447
Q3668.75669.75
95-th percentile847.75857.75
Maximum885890
Range884889
Interquartile range (IQR)466.25446.5

Descriptive statistics

 Dataset ADataset B
Standard deviation260.1043258.7778
Coefficient of variation (CV)0.601129210.57946769
Kurtosis-1.2163897-1.186486
Mean432.69283446.57848
Median Absolute Deviation (MAD)230.5223.5
Skewness0.0864211560.02786659
Sum192981199174
Variance67654.24566965.948
MonotonicityNot monotonicNot monotonic
2024-05-07T16:31:57.941060image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
799 1
 
0.2%
331 1
 
0.2%
637 1
 
0.2%
654 1
 
0.2%
156 1
 
0.2%
79 1
 
0.2%
350 1
 
0.2%
837 1
 
0.2%
712 1
 
0.2%
828 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
39 1
 
0.2%
360 1
 
0.2%
237 1
 
0.2%
797 1
 
0.2%
391 1
 
0.2%
876 1
 
0.2%
437 1
 
0.2%
887 1
 
0.2%
725 1
 
0.2%
286 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
20 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
20 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
272 
1
174 
0
274 
1
172 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row00
3rd row00
4th row01
5th row11

Common Values

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Length

2024-05-07T16:31:58.275198image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T16:31:58.421280image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:58.559685image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Most occurring characters

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
251 
1
105 
2
90 
3
245 
1
113 
2
88 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row33
3rd row32
4th row33
5th row31

Common Values

ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 245
54.9%
1 113
25.3%
2 88
 
19.7%

Length

2024-05-07T16:31:58.710725image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T16:31:58.857844image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:59.008251image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 245
54.9%
1 113
25.3%
2 88
 
19.7%

Most occurring characters

ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 245
54.9%
1 113
25.3%
2 88
 
19.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 245
54.9%
1 113
25.3%
2 88
 
19.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 245
54.9%
1 113
25.3%
2 88
 
19.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 245
54.9%
1 113
25.3%
2 88
 
19.7%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T16:31:59.440082image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length4948
Mean length26.75336327.269058
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1193212162
Distinct characters5960
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowIbrahim Shawah, Mr. YousseffVander Planke, Miss. Augusta Maria
2nd rowSkoog, Mr. WilhelmKiernan, Mr. Philip
3rd rowGustafsson, Mr. Johan BirgerButler, Mr. Reginald Fenton
4th rowEkstrom, Mr. JohanBing, Mr. Lee
5th rowChip, Mr. ChangFrauenthal, Mrs. Henry William (Clara Heinsheimer)
ValueCountFrequency (%)
mr 256
 
14.1%
miss 101
 
5.6%
mrs 63
 
3.5%
william 28
 
1.5%
master 19
 
1.0%
henry 17
 
0.9%
james 15
 
0.8%
john 14
 
0.8%
thomas 13
 
0.7%
george 12
 
0.7%
Other values (906) 1273
70.3%
ValueCountFrequency (%)
mr 263
 
14.4%
miss 87
 
4.8%
mrs 67
 
3.7%
william 33
 
1.8%
john 24
 
1.3%
henry 19
 
1.0%
master 17
 
0.9%
george 13
 
0.7%
thomas 11
 
0.6%
charles 10
 
0.5%
Other values (918) 1277
70.1%
2024-05-07T16:32:00.222338image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1366
 
11.4%
r 931
 
7.8%
e 868
 
7.3%
a 810
 
6.8%
s 679
 
5.7%
i 668
 
5.6%
n 637
 
5.3%
M 568
 
4.8%
l 519
 
4.3%
o 498
 
4.2%
Other values (49) 4388
36.8%
ValueCountFrequency (%)
1376
 
11.3%
r 979
 
8.0%
e 893
 
7.3%
a 827
 
6.8%
n 672
 
5.5%
s 658
 
5.4%
i 654
 
5.4%
M 564
 
4.6%
l 551
 
4.5%
o 526
 
4.3%
Other values (50) 4462
36.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11932
100.0%
ValueCountFrequency (%)
(unknown) 12162
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1366
 
11.4%
r 931
 
7.8%
e 868
 
7.3%
a 810
 
6.8%
s 679
 
5.7%
i 668
 
5.6%
n 637
 
5.3%
M 568
 
4.8%
l 519
 
4.3%
o 498
 
4.2%
Other values (49) 4388
36.8%
ValueCountFrequency (%)
1376
 
11.3%
r 979
 
8.0%
e 893
 
7.3%
a 827
 
6.8%
n 672
 
5.5%
s 658
 
5.4%
i 654
 
5.4%
M 564
 
4.6%
l 551
 
4.5%
o 526
 
4.3%
Other values (50) 4462
36.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11932
100.0%
ValueCountFrequency (%)
(unknown) 12162
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1366
 
11.4%
r 931
 
7.8%
e 868
 
7.3%
a 810
 
6.8%
s 679
 
5.7%
i 668
 
5.6%
n 637
 
5.3%
M 568
 
4.8%
l 519
 
4.3%
o 498
 
4.2%
Other values (49) 4388
36.8%
ValueCountFrequency (%)
1376
 
11.3%
r 979
 
8.0%
e 893
 
7.3%
a 827
 
6.8%
n 672
 
5.5%
s 658
 
5.4%
i 654
 
5.4%
M 564
 
4.6%
l 551
 
4.5%
o 526
 
4.3%
Other values (50) 4462
36.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11932
100.0%
ValueCountFrequency (%)
(unknown) 12162
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1366
 
11.4%
r 931
 
7.8%
e 868
 
7.3%
a 810
 
6.8%
s 679
 
5.7%
i 668
 
5.6%
n 637
 
5.3%
M 568
 
4.8%
l 519
 
4.3%
o 498
 
4.2%
Other values (49) 4388
36.8%
ValueCountFrequency (%)
1376
 
11.3%
r 979
 
8.0%
e 893
 
7.3%
a 827
 
6.8%
n 672
 
5.5%
s 658
 
5.4%
i 654
 
5.4%
M 564
 
4.6%
l 551
 
4.5%
o 526
 
4.3%
Other values (50) 4462
36.7%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
281 
female
165 
male
290 
female
156 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.73991034.6995516
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21142096
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowmalemale
3rd rowmalemale
4th rowmalemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 281
63.0%
female 165
37.0%
ValueCountFrequency (%)
male 290
65.0%
female 156
35.0%

Length

2024-05-07T16:32:00.465995image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T16:32:00.629896image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:32:00.765200image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
male 281
63.0%
female 165
37.0%
ValueCountFrequency (%)
male 290
65.0%
female 156
35.0%

Most occurring characters

ValueCountFrequency (%)
e 611
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 165
 
7.8%
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2114
100.0%
ValueCountFrequency (%)
(unknown) 2096
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 611
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 165
 
7.8%
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2114
100.0%
ValueCountFrequency (%)
(unknown) 2096
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 611
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 165
 
7.8%
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2114
100.0%
ValueCountFrequency (%)
(unknown) 2096
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 611
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 165
 
7.8%
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7470
Distinct (%)20.7%19.7%
Missing8991
Missing (%)20.0%20.4%
Infinite00
Infinite (%)0.0%0.0%
Mean30.1246529.268789
 Dataset ADataset B
Minimum0.750.75
Maximum8074
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T16:32:00.982161image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.750.75
5-th percentile4.84
Q12121
median2928
Q33937.5
95-th percentile56.252.3
Maximum8074
Range79.2573.25
Interquartile range (IQR)1816.5

Descriptive statistics

 Dataset ADataset B
Standard deviation14.56654213.832121
Coefficient of variation (CV)0.483542270.47258947
Kurtosis0.265316110.17139424
Mean30.1246529.268789
Median Absolute Deviation (MAD)98
Skewness0.373893270.34414924
Sum10754.510390.42
Variance212.18413191.32758
MonotonicityNot monotonicNot monotonic
2024-05-07T16:32:01.269268image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25 16
 
3.6%
24 14
 
3.1%
21 14
 
3.1%
18 14
 
3.1%
36 13
 
2.9%
30 12
 
2.7%
22 12
 
2.7%
29 11
 
2.5%
16 10
 
2.2%
26 10
 
2.2%
Other values (64) 231
51.8%
(Missing) 89
 
20.0%
ValueCountFrequency (%)
24 18
 
4.0%
22 17
 
3.8%
30 15
 
3.4%
28 14
 
3.1%
18 12
 
2.7%
36 12
 
2.7%
26 12
 
2.7%
27 12
 
2.7%
25 11
 
2.5%
21 11
 
2.5%
Other values (60) 221
49.6%
(Missing) 91
20.4%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 6
1.3%
2 4
0.9%
3 2
 
0.4%
4 3
0.7%
5 2
 
0.4%
6 2
 
0.4%
7 1
 
0.2%
ValueCountFrequency (%)
0.75 2
 
0.4%
0.92 1
 
0.2%
1 2
 
0.4%
2 7
1.6%
3 2
 
0.4%
4 5
1.1%
5 1
 
0.2%
6 2
 
0.4%
7 1
 
0.2%
8 3
0.7%
ValueCountFrequency (%)
0.75 2
 
0.4%
0.92 1
 
0.2%
1 2
 
0.4%
2 7
1.6%
3 2
 
0.4%
4 5
1.1%
5 1
 
0.2%
6 2
 
0.4%
7 1
 
0.2%
8 3
0.7%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 6
1.3%
2 4
0.9%
3 2
 
0.4%
4 3
0.7%
5 2
 
0.4%
6 2
 
0.4%
7 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.556053810.56502242
 Dataset ADataset B
Minimum00
Maximum88
Zeros298304
Zeros (%)66.8%68.2%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T16:32:01.477660image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile2.753
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.15932821.2082249
Coefficient of variation (CV)2.08492092.1383664
Kurtosis18.79818516.374759
Mean0.556053810.56502242
Median Absolute Deviation (MAD)00
Skewness3.80075033.6172912
Sum248252
Variance1.34404191.4598075
MonotonicityNot monotonicNot monotonic
2024-05-07T16:32:01.644919image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 298
66.8%
1 106
 
23.8%
2 19
 
4.3%
3 9
 
2.0%
4 8
 
1.8%
8 5
 
1.1%
5 1
 
0.2%
ValueCountFrequency (%)
0 304
68.2%
1 99
 
22.2%
2 16
 
3.6%
3 10
 
2.2%
4 9
 
2.0%
8 5
 
1.1%
5 3
 
0.7%
ValueCountFrequency (%)
0 298
66.8%
1 106
 
23.8%
2 19
 
4.3%
3 9
 
2.0%
4 8
 
1.8%
5 1
 
0.2%
8 5
 
1.1%
ValueCountFrequency (%)
0 304
68.2%
1 99
 
22.2%
2 16
 
3.6%
3 10
 
2.2%
4 9
 
2.0%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 304
68.2%
1 99
 
22.2%
2 16
 
3.6%
3 10
 
2.2%
4 9
 
2.0%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 298
66.8%
1 106
 
23.8%
2 19
 
4.3%
3 9
 
2.0%
4 8
 
1.8%
5 1
 
0.2%
8 5
 
1.1%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct76
Distinct (%)1.6%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.390134530.38116592
 Dataset ADataset B
Minimum00
Maximum65
Zeros341341
Zeros (%)76.5%76.5%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T16:32:01.800745image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum65
Range65
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.845499110.79775217
Coefficient of variation (CV)2.16719892.0929263
Kurtosis10.2751077.4936325
Mean0.390134530.38116592
Median Absolute Deviation (MAD)00
Skewness2.85773912.5141442
Sum174170
Variance0.714868750.63640853
MonotonicityNot monotonicNot monotonic
2024-05-07T16:32:01.959198image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 341
76.5%
1 57
 
12.8%
2 38
 
8.5%
4 4
 
0.9%
3 3
 
0.7%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 55
 
12.3%
2 42
 
9.4%
4 3
 
0.7%
3 3
 
0.7%
5 2
 
0.4%
ValueCountFrequency (%)
0 341
76.5%
1 57
 
12.8%
2 38
 
8.5%
3 3
 
0.7%
4 4
 
0.9%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 55
 
12.3%
2 42
 
9.4%
3 3
 
0.7%
4 3
 
0.7%
5 2
 
0.4%
ValueCountFrequency (%)
0 341
76.5%
1 55
 
12.3%
2 42
 
9.4%
3 3
 
0.7%
4 3
 
0.7%
5 2
 
0.4%
ValueCountFrequency (%)
0 341
76.5%
1 57
 
12.8%
2 38
 
8.5%
3 3
 
0.7%
4 4
 
0.9%
5 2
 
0.4%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct376383
Distinct (%)84.3%85.9%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T16:32:02.512180image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.74887896.6345291
Min length43

Characters and Unicode

 Dataset ADataset B
Total characters30102959
Distinct characters3535
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique322342 ?
Unique (%)72.2%76.7%

Sample

 Dataset ADataset B
1st row2685345764
2nd row347088367229
3rd row3101277234686
4th row3470611601
5th row1601PC 17611
ValueCountFrequency (%)
pc 31
 
5.5%
c.a 10
 
1.8%
a/5 8
 
1.4%
ca 7
 
1.2%
w./c 7
 
1.2%
sc/paris 5
 
0.9%
2343 5
 
0.9%
f.c.c 4
 
0.7%
ston/o 4
 
0.7%
2 4
 
0.7%
Other values (399) 479
84.9%
ValueCountFrequency (%)
pc 26
 
4.7%
c.a 12
 
2.2%
a/5 11
 
2.0%
ca 8
 
1.4%
soton/oq 7
 
1.3%
2343 5
 
0.9%
w./c 5
 
0.9%
1601 5
 
0.9%
line 4
 
0.7%
347082 4
 
0.7%
Other values (403) 471
84.4%
2024-05-07T16:32:03.442967image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 383
12.7%
1 332
11.0%
2 289
9.6%
7 259
8.6%
4 237
 
7.9%
0 217
 
7.2%
6 209
 
6.9%
5 186
 
6.2%
9 154
 
5.1%
8 146
 
4.9%
Other values (25) 598
19.9%
ValueCountFrequency (%)
3 357
12.1%
1 341
11.5%
2 292
9.9%
7 244
8.2%
4 221
 
7.5%
6 212
 
7.2%
5 198
 
6.7%
0 192
 
6.5%
9 175
 
5.9%
8 137
 
4.6%
Other values (25) 590
19.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3010
100.0%
ValueCountFrequency (%)
(unknown) 2959
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 383
12.7%
1 332
11.0%
2 289
9.6%
7 259
8.6%
4 237
 
7.9%
0 217
 
7.2%
6 209
 
6.9%
5 186
 
6.2%
9 154
 
5.1%
8 146
 
4.9%
Other values (25) 598
19.9%
ValueCountFrequency (%)
3 357
12.1%
1 341
11.5%
2 292
9.9%
7 244
8.2%
4 221
 
7.5%
6 212
 
7.2%
5 198
 
6.7%
0 192
 
6.5%
9 175
 
5.9%
8 137
 
4.6%
Other values (25) 590
19.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3010
100.0%
ValueCountFrequency (%)
(unknown) 2959
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 383
12.7%
1 332
11.0%
2 289
9.6%
7 259
8.6%
4 237
 
7.9%
0 217
 
7.2%
6 209
 
6.9%
5 186
 
6.2%
9 154
 
5.1%
8 146
 
4.9%
Other values (25) 598
19.9%
ValueCountFrequency (%)
3 357
12.1%
1 341
11.5%
2 292
9.9%
7 244
8.2%
4 221
 
7.5%
6 212
 
7.2%
5 198
 
6.7%
0 192
 
6.5%
9 175
 
5.9%
8 137
 
4.6%
Other values (25) 590
19.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3010
100.0%
ValueCountFrequency (%)
(unknown) 2959
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 383
12.7%
1 332
11.0%
2 289
9.6%
7 259
8.6%
4 237
 
7.9%
0 217
 
7.2%
6 209
 
6.9%
5 186
 
6.2%
9 154
 
5.1%
8 146
 
4.9%
Other values (25) 598
19.9%
ValueCountFrequency (%)
3 357
12.1%
1 341
11.5%
2 292
9.9%
7 244
8.2%
4 221
 
7.5%
6 212
 
7.2%
5 198
 
6.7%
0 192
 
6.5%
9 175
 
5.9%
8 137
 
4.6%
Other values (25) 590
19.9%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct173178
Distinct (%)38.8%39.9%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean32.75578133.830502
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros57
Zeros (%)1.1%1.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T16:32:03.723313image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.90317.925
median13.4583514.45625
Q331.35937532.875
95-th percentile130.2375120
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.45627524.95

Descriptive statistics

 Dataset ADataset B
Standard deviation48.41726649.688015
Coefficient of variation (CV)1.47812891.4687342
Kurtosis27.20245424.475593
Mean32.75578133.830502
Median Absolute Deviation (MAD)5.935456.92085
Skewness4.23387234.0366934
Sum14609.07815088.404
Variance2344.23172468.8988
MonotonicityNot monotonicNot monotonic
2024-05-07T16:32:04.001210image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.8958 22
 
4.9%
13 19
 
4.3%
8.05 19
 
4.3%
7.75 16
 
3.6%
26 14
 
3.1%
10.5 14
 
3.1%
7.925 12
 
2.7%
7.775 10
 
2.2%
8.6625 10
 
2.2%
7.225 8
 
1.8%
Other values (163) 302
67.7%
ValueCountFrequency (%)
8.05 24
 
5.4%
13 23
 
5.2%
7.8958 21
 
4.7%
26 15
 
3.4%
10.5 15
 
3.4%
7.75 14
 
3.1%
26.55 9
 
2.0%
7.225 8
 
1.8%
7.775 8
 
1.8%
7.2292 7
 
1.6%
Other values (168) 302
67.7%
ValueCountFrequency (%)
0 5
1.1%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.1417 1
 
0.2%
7.225 8
1.8%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.75 2
 
0.4%
6.95 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.75 2
 
0.4%
6.95 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 5
1.1%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.1417 1
 
0.2%
7.225 8
1.8%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8793
Distinct (%)79.8%85.3%
Missing337337
Missing (%)75.6%75.6%
Memory size7.0 KiB7.0 KiB
2024-05-07T16:32:04.556036image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.62385323.6238532
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters395395
Distinct characters1919
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6879 ?
Unique (%)62.4%72.5%

Sample

 Dataset ADataset B
1st rowB57 B59 B63 B66D9
2nd rowA32B39
3rd rowD11C52
4th rowE44B58 B60
5th rowC50C126
ValueCountFrequency (%)
e101 3
 
2.3%
f2 3
 
2.3%
d 3
 
2.3%
f 3
 
2.3%
d35 2
 
1.6%
e44 2
 
1.6%
c123 2
 
1.6%
c78 2
 
1.6%
c26 2
 
1.6%
c22 2
 
1.6%
Other values (89) 105
81.4%
ValueCountFrequency (%)
g6 3
 
2.3%
b98 3
 
2.3%
b96 3
 
2.3%
c22 2
 
1.6%
b49 2
 
1.6%
b60 2
 
1.6%
b58 2
 
1.6%
c78 2
 
1.6%
c83 2
 
1.6%
c26 2
 
1.6%
Other values (96) 105
82.0%
2024-05-07T16:32:05.310153image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 44
11.1%
C 38
 
9.6%
B 32
 
8.1%
1 31
 
7.8%
3 30
 
7.6%
6 29
 
7.3%
5 25
 
6.3%
E 21
 
5.3%
7 21
 
5.3%
20
 
5.1%
Other values (9) 104
26.3%
ValueCountFrequency (%)
C 48
12.2%
2 38
 
9.6%
B 35
 
8.9%
1 35
 
8.9%
6 33
 
8.4%
5 26
 
6.6%
3 23
 
5.8%
8 23
 
5.8%
19
 
4.8%
9 19
 
4.8%
Other values (9) 96
24.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 395
100.0%
ValueCountFrequency (%)
(unknown) 395
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 44
11.1%
C 38
 
9.6%
B 32
 
8.1%
1 31
 
7.8%
3 30
 
7.6%
6 29
 
7.3%
5 25
 
6.3%
E 21
 
5.3%
7 21
 
5.3%
20
 
5.1%
Other values (9) 104
26.3%
ValueCountFrequency (%)
C 48
12.2%
2 38
 
9.6%
B 35
 
8.9%
1 35
 
8.9%
6 33
 
8.4%
5 26
 
6.6%
3 23
 
5.8%
8 23
 
5.8%
19
 
4.8%
9 19
 
4.8%
Other values (9) 96
24.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 395
100.0%
ValueCountFrequency (%)
(unknown) 395
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 44
11.1%
C 38
 
9.6%
B 32
 
8.1%
1 31
 
7.8%
3 30
 
7.6%
6 29
 
7.3%
5 25
 
6.3%
E 21
 
5.3%
7 21
 
5.3%
20
 
5.1%
Other values (9) 104
26.3%
ValueCountFrequency (%)
C 48
12.2%
2 38
 
9.6%
B 35
 
8.9%
1 35
 
8.9%
6 33
 
8.4%
5 26
 
6.6%
3 23
 
5.8%
8 23
 
5.8%
19
 
4.8%
9 19
 
4.8%
Other values (9) 96
24.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 395
100.0%
ValueCountFrequency (%)
(unknown) 395
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 44
11.1%
C 38
 
9.6%
B 32
 
8.1%
1 31
 
7.8%
3 30
 
7.6%
6 29
 
7.3%
5 25
 
6.3%
E 21
 
5.3%
7 21
 
5.3%
20
 
5.1%
Other values (9) 104
26.3%
ValueCountFrequency (%)
C 48
12.2%
2 38
 
9.6%
B 35
 
8.9%
1 35
 
8.9%
6 33
 
8.4%
5 26
 
6.6%
3 23
 
5.8%
8 23
 
5.8%
19
 
4.8%
9 19
 
4.8%
Other values (9) 96
24.3%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing21
Missing (%)0.4%0.2%
Memory size7.0 KiB7.0 KiB
S
319 
C
81 
Q
44 
S
328 
C
84 
Q
33 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters444445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowCS
2nd rowSQ
3rd rowSS
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 319
71.5%
C 81
 
18.2%
Q 44
 
9.9%
(Missing) 2
 
0.4%
ValueCountFrequency (%)
S 328
73.5%
C 84
 
18.8%
Q 33
 
7.4%
(Missing) 1
 
0.2%

Length

2024-05-07T16:32:05.528961image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T16:32:05.674679image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:32:05.823501image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
s 319
71.8%
c 81
 
18.2%
q 44
 
9.9%
ValueCountFrequency (%)
s 328
73.7%
c 84
 
18.9%
q 33
 
7.4%

Most occurring characters

ValueCountFrequency (%)
S 319
71.8%
C 81
 
18.2%
Q 44
 
9.9%
ValueCountFrequency (%)
S 328
73.7%
C 84
 
18.9%
Q 33
 
7.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 319
71.8%
C 81
 
18.2%
Q 44
 
9.9%
ValueCountFrequency (%)
S 328
73.7%
C 84
 
18.9%
Q 33
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 319
71.8%
C 81
 
18.2%
Q 44
 
9.9%
ValueCountFrequency (%)
S 328
73.7%
C 84
 
18.9%
Q 33
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 319
71.8%
C 81
 
18.2%
Q 44
 
9.9%
ValueCountFrequency (%)
S 328
73.7%
C 84
 
18.9%
Q 33
 
7.4%

Interactions

Dataset A

2024-05-07T16:31:52.156641image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:56.089143image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:49.499520image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:53.485845image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:50.129860image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:54.092251image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:50.775415image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:54.710058image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:51.519329image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:55.479057image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:52.271290image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:56.202598image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:49.617478image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:53.599505image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:50.252028image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:54.208936image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:50.899644image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:54.958521image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:51.636249image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:55.594485image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:52.405746image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:56.329573image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:49.753171image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:53.723886image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:50.390801image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:54.341759image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:51.117269image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:55.085468image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:51.771811image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:55.720149image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:52.540934image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:56.463481image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:49.888098image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:53.858036image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:50.518137image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:54.463609image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:51.259779image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:55.227037image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:51.908892image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:55.854330image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:52.667140image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:56.583327image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:50.012693image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:53.974059image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:50.650696image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:54.588412image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:51.391093image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:55.353056image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T16:31:52.033791image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T16:31:55.970410image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Missing values

Dataset A

2024-05-07T16:31:52.845161image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-05-07T16:31:56.760653image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-05-07T16:31:53.105126image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-05-07T16:31:57.018662image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-05-07T16:31:53.278055image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-05-07T16:31:57.184389image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
79879903Ibrahim Shawah, Mr. Yousseffmale30.00026857.2292NaNC
36036103Skoog, Mr. Wilhelmmale40.01434708827.9000NaNS
39239303Gustafsson, Mr. Johan Birgermale28.02031012777.9250NaNS
12913003Ekstrom, Mr. Johanmale45.0003470616.9750NaNS
83883913Chip, Mr. Changmale32.000160156.4958NaNS
44044112Hart, Mrs. Benjamin (Esther Ada Bloomfield)female45.011F.C.C. 1352926.2500NaNS
66066111Frauenthal, Dr. Henry Williammale50.020PC 17611133.6500NaNS
75275303Vande Velde, Mr. Johannes Josephmale33.0003457809.5000NaNS
40240303Jussila, Miss. Mari Ainafemale21.01041379.8250NaNS
31131211Ryerson, Miss. Emily Boriefemale18.022PC 17608262.3750B57 B59 B63 B66C

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
383903Vander Planke, Miss. Augusta Mariafemale18.02034576418.0000NaNS
21421503Kiernan, Mr. PhilipmaleNaN103672297.7500NaNQ
66666702Butler, Mr. Reginald Fentonmale25.00023468613.0000NaNS
747513Bing, Mr. Leemale32.000160156.4958NaNS
33433511Frauenthal, Mrs. Henry William (Clara Heinsheimer)femaleNaN10PC 17611133.6500NaNS
40140203Adams, Mr. Johnmale26.0003418268.0500NaNS
404103Ahlin, Mrs. Johan (Johanna Persdotter Larsson)female40.01075469.4750NaNS
28828912Hosono, Mr. Masabumimale42.00023779813.0000NaNS
86586612Bystrom, Mrs. (Karolina)female42.00023685213.0000NaNS
69569602Chapman, Mr. Charles Henrymale52.00024873113.5000NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
30530611Allison, Master. Hudson Trevormale0.9212113781151.5500C22 C26S
343501Meyer, Mr. Edgar Josephmale28.0010PC 1760482.1708NaNC
44945011Peuchen, Major. Arthur Godfreymale52.000011378630.5000C104S
10010103Petranec, Miss. Matildafemale28.00003492457.8958NaNS
29029111Barber, Miss. Ellen "Nellie"female26.00001987778.8500NaNS
46947013Baclini, Miss. Helene Barbarafemale0.7521266619.2583NaNC
697003Kink, Mr. Vincenzmale26.00203151518.6625NaNS
49749803Shellard, Mr. Frederick WilliammaleNaN00C.A. 621215.1000NaNS
13713801Futrelle, Mr. Jacques Heathmale37.001011380353.1000C123S
15815903Smiljanic, Mr. MilemaleNaN003150378.6625NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
53553612Hart, Miss. Eva Miriamfemale7.002F.C.C. 1352926.2500NaNS
56056103Morrow, Mr. Thomas RowanmaleNaN003726227.7500NaNQ
67367412Wilhelms, Mr. Charlesmale31.00024427013.0000NaNS
87087103Balkic, Mr. Cerinmale26.0003492487.8958NaNS
70971013Moubarek, Master. Halim Gonios ("William George")maleNaN11266115.2458NaNC
30230303Johnson, Mr. William Cahoone Jrmale19.000LINE0.0000NaNS
48949013Coutts, Master. Eden Leslie "Neville"male9.011C.A. 3767115.9000NaNS
87287301Carlsson, Mr. Frans Olofmale33.0006955.0000B51 B53 B55S
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NaNC
72872902Bryhl, Mr. Kurt Arnold Gottfridmale25.01023685326.0000NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.